6 research outputs found

    Design and optimization of approximate multipliers and dividers for integer and floating-point arithmetic

    Full text link
    The dawn of the twenty-first century has witnessed an explosion in the number of digital devices and data. While the emerging deep learning algorithms to extract information from this vast sea of data are becoming increasingly compute-intensive, traditional means of improving computing power are no longer yielding gains at the same rate due to the diminishing returns from traditional technology scaling. To minimize the increasing gap between computational demands and the available resources, the paradigm of approximate computing is emerging as one of the potential solutions. Specifically, the resource-efficient approximate arithmetic units promise overall system efficiency, since most of the compute-intensive applications are dominated by arithmetic operations. This thesis primarily presents design techniques for approximate hardware multipliers and dividers. The thesis presents the design of two approximate integer multipliers and an approximate integer divider. These are: an error-configurable minimally-biased approximate integer multiplier (MBM), an error-configurable reduced-error approximate log based multiplier (REALM), and error-configurable integer divider INZeD. The two multiplier designs and the divider designs are based on the coupling of novel mathematically formulated error-reduction mechanisms in the classical approximate log based multiplier and dividers, respectively. They exhibit very low error bias and offer Pareto-optimal error vs. resource-efficiency trade-offs when compared with the state-of-the-art approximate integer multipliers/dividers. Further, the thesis also presents design of approximate floating-point multipliers and dividers. These designs utilize the optimized versions of the proposed MBM and REALM multipliers for mantissa multiplications and the proposed INZeD divider for mantissa division, and offer better design trade-offs than traditional precision scaling. The existing approximate integer dividers as well as the proposed INZeD suffer from unreasonably high worst-case error. This thesis presents WEID, which is a novel light-weight method for reducing worst-case error in approximate dividers. Finally, the thesis presents a methodology for selection of approximate arithmetic units for a given application. The methodology is based on a novel selection algorithm and utilizes the subrange error characterization of approximate arithmetic units, which performs error characterization independently in different segments of the input range

    ApproxTrain: Fast Simulation of Approximate Multipliers for DNN Training and Inference

    Full text link
    Edge training of Deep Neural Networks (DNNs) is a desirable goal for continuous learning; however, it is hindered by the enormous computational power required by training. Hardware approximate multipliers have shown their effectiveness for gaining resource-efficiency in DNN inference accelerators; however, training with approximate multipliers is largely unexplored. To build resource efficient accelerators with approximate multipliers supporting DNN training, a thorough evaluation of training convergence and accuracy for different DNN architectures and different approximate multipliers is needed. This paper presents ApproxTrain, an open-source framework that allows fast evaluation of DNN training and inference using simulated approximate multipliers. ApproxTrain is as user-friendly as TensorFlow (TF) and requires only a high-level description of a DNN architecture along with C/C++ functional models of the approximate multiplier. We improve the speed of the simulation at the multiplier level by using a novel LUT-based approximate floating-point (FP) multiplier simulator on GPU (AMSim). ApproxTrain leverages CUDA and efficiently integrates AMSim into the TensorFlow library, in order to overcome the absence of native hardware approximate multiplier in commercial GPUs. We use ApproxTrain to evaluate the convergence and accuracy of DNN training with approximate multipliers for small and large datasets (including ImageNet) using LeNets and ResNets architectures. The evaluations demonstrate similar convergence behavior and negligible change in test accuracy compared to FP32 and bfloat16 multipliers. Compared to CPU-based approximate multiplier simulations in training and inference, the GPU-accelerated ApproxTrain is more than 2500x faster. Based on highly optimized closed-source cuDNN/cuBLAS libraries with native hardware multipliers, the original TensorFlow is only 8x faster than ApproxTrain.Comment: 14 pages, 12 figure

    A Power to Pulse Width Modulation Sensor for Remote Power Analysis Attacks

    Get PDF
    Field-programmable gate arrays (FPGAs) deployed on commercial cloud services are increasingly gaining popularity due to the cost and compute benefits offered by them. Recent studies have discovered security threats than can be launched remotely on FPGAs that share the logic fabric between trusted and untrusted parties, posing a danger to designs deployed on cloud FPGAs. With remote power analysis (RPA) attacks, an attacker aims to deduce secret information present on a remote FPGA by deploying an on-chip sensor on the FPGA logic fabric. Information captured with the on-chip sensor is transferred off the chip for analysis and existing on-chip sensors demand a significant amount of bandwidth for this task as a result of their wider output bit width. However, attackers are often left with the only option of using a covert communication channel and the bandwidth of such channels is generally limited. This paper proposes a novel area-efficient on-chip power sensor named PPWM that integrates a logic design outputting a pulse whose width is modulated by the power consumption of the FPGA. This pulse is used to clear a flip-flop selectively and asynchronously, and the single-bit output of the flip-flop is used to perform an RPA attack. This paper demonstrates the possibility of successfully recovering a 128-bit Advanced Encryption Standard (AES) key within 16,000 power traces while consuming just 25% of the bandwidth when compared to the state of the art. Moreover, this paper assesses the threat posed by the proposed PPWM to remote FPGAs including those that are deployed on cloud services

    Minimally Biased Multipliers for Approximate Integer and Floating-Point Multiplication

    No full text

    VITI: A Tiny Self-Calibrating Sensor for Power-Variation Measurement in FPGAs

    Get PDF
    On-chip sensors, built using reconfigurable logic resources in field programmable gate arrays (FPGAs), have been shown to sense variations in signalpropagation delay, supply voltage and power consumption. These sensors have been successfully used to deploy security attacks called Remote Power Analysis (RPA) Attacks on FPGAs. The sensors proposed thus far consume significant logic resources and some of them could be used to deploy power viruses. In this paper, a sensor (named VITI) occupying a far smaller footprint than existing sensors is presented. VITI is a self-calibrating on-chip sensor design, constructed using adjustable delay elements, flip-flops and LUT elements instead of combinational loops, bulky carry chains or latches. Self-calibration enables VITI the autonomous adaptation to differing situations (such as increased power consumption, temperature changes or placement of the sensor in faraway locations from the circuit under attack). The efficacy of VITI for power consumption measurement was evaluated using Remote Power Analysis (RPA) attacks and results demonstrate recovery of a full 128-bit Advanced Encryption Standard (AES) key with only 20,000 power traces. Experiments demonstrate that VITI consumes 1/4th and 1/16th of the area compared to state-of-the-art sensors such as time to digital converters and ring oscillators for similar effectiveness

    A Power to Pulse Width Modulation Sensor for Remote Power Analysis Attacks

    Get PDF
    Field-programmable gate arrays (FPGAs) deployed on commercial cloud services are increasingly gaining popularity due to the cost and compute benefits offered by them. Recent studies have discovered security threats than can be launched remotely on FPGAs that share the logic fabric between trusted and untrusted parties, posing a danger to designs deployed on cloud FPGAs. With remote power analysis (RPA) attacks, an attacker aims to deduce secret information present on a remote FPGA by deploying an on-chip sensor on the FPGA logic fabric. Information captured with the on-chip sensor is transferred off the chip for analysis and existing on-chip sensors demand a significant amount of bandwidth for this task as a result of their wider output bit width. However, attackers are often left with the only option of using a covert communication channel and the bandwidth of such channels is generally limited. This paper proposes a novel area-efficient on-chip power sensor named PPWM that integrates a logic design outputting a pulse whose width is modulated by the power consumption of the FPGA. This pulse is used to clear a flip-flop selectively and asynchronously, and the single-bit output of the flip-flop is used to perform an RPA attack. This paper demonstrates the possibility of successfully recovering a 128-bit Advanced Encryption Standard (AES) key within 16,000 power traces while consuming just 25% of the bandwidth when compared to the state of the art. Moreover, this paper assesses the threat posed by the proposed PPWM to remote FPGAs including those that are deployed on cloud services
    corecore